Medical Decision Making — Latest Matching Preprints

1

Designing welfare-maximising financing for publicly-provisioned digital child-health platforms: A mixed-methods policy simulation from Thailands KhunLook application

Lounkaew, K.

2026-05-04 health economics 10.64898/2026.04.26.26351784 medRxiv

Top 0.1%

9.0%

Show abstract

National digital health platforms are scaling faster than the evidence on how to finance them. This paper develops a welfare-simulation framework that converts a published willingness-to-pay (WTP) distribution into a prescriptive pricing recommendation, applied to Thailands KhunLook maternal-and-child-health application. Predicted WTP values at the 25th, 50th and 75th unconditional quantiles and the OLS mean -- drawn from a survey of n = 680 Thai parents and relatives of young children previously reported in Lounkaew et al. (2025) -- enter the simulation as parametric inputs. Quintile-level WTP is imputed by monotone-cubic interpolation, a population of 250,000 caregivers is drawn from truncated-Normal distributions around the quintile means, and five financing scenarios are compared: full public provision (S1), a flat market-priced fee (S2), freemium (S3), fine-grained income-tiered pricing (S4), and a means-tested subsidy with a flat fee for the top 60% (S5). A thematic reading of Thai digital-health policy documents bounds the institutionally feasible scenario set and anchors the interpretation of the simulation numbers. Full public provision maximises total welfare at 437.4 million THB but runs a five-year fiscal deficit. The means-tested subsidy gives up about 15% of that welfare to recover 198.6 million THB in net producer surplus, distributes consumer surplus toward lower-income quintiles (concentration index -0.258), and plugs into the existing Thai state welfare card register at near-zero marginal administrative cost. The ranking holds across all twelve sensitivity specifications. Administrative simplicity in subsidy targeting, read against the Thai WTP distribution, dominates finer-grained tiering on both welfare and equity grounds. The framework transfers cleanly to other middle-income countries deciding how to price a national digital health platform. Author summaryMany middle-income-country governments now run free national smartphone apps for the health of mothers and young children, but the funding model is increasingly fragile as initial donor and research grants run out. The question this paper asks is simple: if such a platform had to start charging, what pricing structure would raise the most money without locking out the families who need the app most? Using a published Thai survey of 680 parents and relatives of young children, the paper simulates five alternative designs -- free, flat fee, freemium, fine-tiered by income quintile, and a means-tested subsidy -- and finds that offering the bottom 40% of households free access while charging the top 60% a flat 395 Thai baht per year (roughly USD 11) captures 85% of the welfare of the status-quo free model, generates 199 million baht of fiscal surplus over five years, and distributes benefits toward lower-income users rather than toward the well-off. The design works because Thailands state welfare card register already identifies the low-income target population, so means-testing is essentially free to administer. Other countries with comparable social registries can apply the same logic to their own digital health platforms.

2

Simulation-Based Comparison of ControlledInterrupted Time Series (CITS) and Multivariable Regression

ORWA, F. O.; Mutai, C.; Nizeyimana, I.; Mwangi, A.

2026-04-13 health policy 10.64898/2026.04.10.26350670 medRxiv

Top 0.1%

8.5%

Show abstract

When randomized controlled trials are impractical, interrupted time series designs offer a rigorous quasi-experimental approach to assess population level policies. Indeed, in the context of quasi-experimental designs (QEDs), the Interrupted Time Series (ITS) method is commonly thought of as the most robust. But interrupted time series designs are susceptible to serial correlation and confounding by time-varying factors associated with both the intervention and the outcome, which may result in biased inference. Thus, we provide a simulation-based contrast of controlled interrupted time series (CITS) and multivariable regression (multivariable negative binomial regression) for estimation of policy effects in count time series data. These approaches are widely used in policy evaluations, yet their comparative performance in typical population health settings has rarely been examined directly. We tested both approaches within a variety of data generating situations, differing in the series length, intervention effect size, and magnitude of lag-1 autocorrelation. Bias, standard error calibration, confidence interval coverage, mean squared error, and statistical power were assessed for performance. Both methods gave unbiased estimates for moderate and large intervention effects, although bias was more pronounced for small effects, particularly in short series. Although the point estimate performance was similar, inferential properties varied significantly. CITS always had smaller mean squared error, better consistency between model based and empirical standard errors, and confidence interval coverage near the 95% nominal levels over weak to moderate autocorrelation. By contrast, multivariable regression was more sensitive to serial dependence, leading to underestimated standard errors and undercoverage, especially at moderate to high autocorrelation, regardless of Newey-West adjustments. These findings show the benefits of using a concurrent control series and the importance of structurally accounting for serial correlation when studying population level policies with time series data.

3

From Protocol to Analysis Plan: Development and Validation of a Large Language Model Pipeline for Statistical Analysis Plan Generation using Artificial Intelligence (SAPAI)

Jafari, H.; Chu, P.; Lange, M.; Maher, F.; Glen, C.; Pearson, O. J.; Burges, C.; Martyn, M.; Cross, S.; Carter, B.; Emsley, R.; Forbes, G.

2026-03-19 health systems and quality improvement 10.64898/2026.03.19.26348626 medRxiv

Top 0.1%

6.2%

Show abstract

Background: Statistical Analysis Plans (SAPs) are essential for trial transparency and credibility but are resource-intensive to produce. While Large Language Models (LLMs) have shown promise in drafting protocols, their ability to generate high-quality, protocol-compliant SAPs remains untested against current content guidance. This study developed and validated an LLM-based pipeline for drafting SAPs from clinical trial protocols. Methods: We developed a structured, section-by-section prompting pipeline aligned with standard SAP guidance. We applied this pipeline to nine clinical trial protocols using three leading LLMs: OpenAI GPT-5, Anthropic Claude Sonnet 4, and Google Gemini 2.5 Pro. The resulting 27 SAPs were evaluated against a 46-item quality checklist derived from the published SAP guidelines. Items were double-scored by independent trial statisticians on a 0 to 3 scale for accuracy. We compared performance across LLMs and between item types (descriptive vs. statistical reasoning) using mixed-effects logistic regression. Results: Across 9 trials, the models produced SAP drafts with high overall accuracy (77% to 78%), with no difference in performance between the three LLMs (p=0.79) but varied by content type (p < 0.001). All models performed well on descriptive items (e.g., administrative details, trial design), with lower accuracy for items requiring statistical reasoning (e.g., modelling strategies, sensitivity analyses). Accuracy for statistical items ranged from 67% to 72%, whereas descriptive items achieved 81% to 83% accuracy. Qualitatively, models were prone to specific failure modes in complex sections, such as omitting necessary details for secondary outcome models or hallucinating sensitivity analyses. Discussion: Current LLMs can effectively draft portions of SAPs, offering the potential for substantial time savings in trial documentation. However, a human-in-the-loop approach remains mandatory; while models demonstrate strong capability in producing descriptive content, their independent application to complex statistical methodology design still requires further methodological development and training. Future work should explore advanced prompt engineering, such as retrieval-augmented generation or agentic workflows, to improve reasoning capabilities.

4

Economic value of resistance-guided gonorrhea treatment: cost-neutrality thresholds for resistance test pricing in the United States

Nichols, B. E.; Wonderly Trainor, B.; Hampson, G.; Grad, Y. H.; Klausner, J. D.

2026-04-07 health economics 10.64898/2026.04.07.26350302 medRxiv

Top 0.1%

5.0%

Show abstract

Background: Rising antimicrobial resistance in Neisseria gonorrhoeae threatens the effectiveness of existing therapies. Resistance-guided treatment (RGT) may reduce treatment failures, complications, and inappropriate use of last-line agents while slowing resistance emergence. Methods and Findings: We developed an individual-level stochastic simulation model of gonorrhea diagnosis and treatment in the United States, incorporating infection prevalence, symptom status, diagnostic accuracy, resistance profiles, treatment pathways, and partner management (costs in 2025 USD). We evaluated three resistance testing strategies, ciprofloxacin-only, ciprofloxacin+ceftriaxone, and triple-target (including a novel drug A), across a wide range of resistance scenarios. We quantified economic value across three dimensions: (1) per-episode direct medical cost savings, (2) system-level costs attributable to ceftriaxone resistance emergence among MSM, and (3) avoided costs of new antibiotic development, estimating the maximum per-test price at which RGT remains cost-neutral. Per-episode cost-neutrality thresholds ranged from near $0 when ceftriaxone resistance was absent to up to $45/test at 15% ceftriaxone resistance. At 50% ciprofloxacin and 5% ceftriaxone resistance, the population-weighted threshold was $4 (95% UI:$3-$8) for a CIP-only test and $11 (95% UI:$5-$14) for a triple-target test. Among MSM, incorporating system-level resistance emergence costs and avoided antibiotic development costs increased the total per-test value to $35-$145 for a single-target test and $84-$128 for a triple-target test, depending on whether prescribing practices shift when ceftriaxone resistance reaches 5%. Conclusions: Resistance-guided therapy offers economic benefits across multiple dimensions even at relatively high diagnostic prices, supporting investment in gonorrhea resistance testing to improve partner outcomes, delay resistance emergence, and enhance the long-term cost-efficiency of gonorrhea management.

5

Cost-effectiveness of direct oral anticoagulant management programs

King, J. B.; Derington, C. G.; Xu, S.; Clark, N. P.; Reynolds, K.; An, J.; Witt, D. M.; O'Keeffe Rosetti, M.; Lang, D. T.; Ho, P. M.; Ozanne, E. M.; Bellows, B. K.

2026-03-14 health economics 10.64898/2026.03.12.26348295 medRxiv

Top 0.1%

4.4%

Show abstract

BackgroundPharmacist-led anticoagulation management services (AMS) for direct oral anticoagulants (DOACs) reduce prescribing errors and enhance adherence, but have not demonstrated lower rates of stroke or bleeding compared to usual care, and their cost-effectiveness is unknown. We evaluated four anticoagulant strategies for patients with atrial fibrillation initiating therapy: warfarin AMS, DOAC usual care, DOAC population management tool (PMT), and DOAC AMS. MethodsWe developed a Markov model with monthly cycles simulating lifetime risk of ischemic stroke, major bleeding, death, disability, and costs from a US healthcare sector perspective. Costs and outcomes were discounted 3% annually. Model probabilities were derived from a prior Kaiser Permanente comparative-effectiveness analysis. Other inputs from published literature and national data. Primary outcomes were direct healthcare costs (2025 USD), quality-adjusted life years (QALYs), and incremental cost-effectiveness ratios (ICERs). Sensitivity analyses assessed parameter uncertainty. ResultsDOAC-based strategies yielded greater QALYs than warfarin AMS and were cost-effective at standard willingness-to-pay thresholds. Compared with warfarin AMS, DOAC usual care gained 0.4 QALYs (ICER $89,200/QALY), DOAC PMT gained 0.6 QALYs (ICER $66,700/QALY), and DOAC AMS gained 0.6 QALYs (ICER $64,500/QALY). DOAC usual care and DOAC PMT were extendedly dominated by DOAC AMS. At $120,000/QALY, DOAC AMS was preferred in 50.4% of probabilistic iterations, DOAC PMT in 36.3%, DOAC usual care in 11.0%, and warfarin AMS in 2.3%. Results were most sensitive to DOAC program effectiveness and DOAC costs. ConclusionsPharmacist-led DOAC management is cost-effective compared with warfarin AMS for AF patients. These findings support broader adoption of structured DOAC management programs to optimize anticoagulation therapy.

6

Cost of Goods Sold Analysis for Manufacturing mRNA-Based Cell and Gene Therapies

Lieberthal, R. D.; Buontempo, P.; Harmon, B.; Omosule, A.; Washabaugh, M.; Whittaker, A.

2026-05-06 health economics 10.64898/2026.05.04.26352406 medRxiv

Top 0.1%

4.1%

Show abstract

BackgroundCell and gene therapies (CGT) represent a transformative class of medical interventions, yet their high production costs limit patient access. Understanding the structure of manufacturing costs is essential for informing policies that can expand access to these therapies. ObjectiveThis study develops and applies a cost-of-goods-sold (COGS) model to analyze the contributors to manufacturing costs for mRNA-based CGT, with application to a wide range of current and future therapies. MethodsAn Excel-based COGS model was constructed based on cost categories for CGT. Two mRNA-based products at commercial scale were used to populate the model: an mRNA vaccine and a therapeutic mRNA gene therapy. Cost inputs were drawn from vendor pricing, peer-reviewed and grey literature, and expert consultation with CGT manufacturing specialists. Three scenarios (worst, base, and best case) were modeled across six cost categories: materials, consumables, capital, labor, licenses, and royalties. A tornado diagram sensitivity analysis was conducted to identify key cost drivers. The mRNA vaccine was used to build and validate the model strucutre using publicly available data sources. The therapeutic mRNA therapy was used as the main use case for illustration and sensitivity analysis. ResultsUnder base-case assumptions, the estimated cost per dose for the therapeutic mRNA product is $56.09, ranging from $3.68 (best case) to $383.22 (worst case). Licensing and royalty fees together account for approximately 83% of total base-case COGS ($6,996,000 and $6,960,000 per production run, respectively, out of $16,825,597 total). Excluding these fees, material costs represent the largest remaining share (61%), followed by consumables (34%), capital (4%), and labor (1%). Sensitivity analysis confirms that licensing and royalty assumptions are the dominant source of uncertainty in the model. ConclusionsLicensing and royalty fees are the primary driver of mRNA-based CGT production costs and represent the greatest opportunity for cost reduction through policy intervention. Strategic priorities for cost reduction should focus on optimizing reagent utilization, increasing platform potency, and expanding use of contract development and manufacturing organizations (CDMOs) to reduce capital and labor costs. Key PointsProducing an example mRNA gene therapy costs about $56 per dose to manufacture, driven almost entirely driven by fees paid to patent holders for the underlying technology. Licensing and royalty fees cost roughly 83 cents of every dollar spent on these new biopharmaceutical products. Until that changes, the gap between what therapies cost to make and what patients and payers are charged will remain very wide.

7

Auditing What Was Said: The Epistemic Promise and Limits of Ambient AI in Clinical Practice

Misrai, V.; Bruchon, A.; Campan, A.; Loubes, J. M.; Piau, A.

2026-04-30 health systems and quality improvement 10.64898/2026.04.29.26351997 medRxiv

Top 0.1%

3.0%

Show abstract

Traditional audit methods that rely on written records often miss the nuances of clinical reasoning that influence patient care. Ambient artificial intelligence captures spoken clinical encounters, allowing the analysis of real clinician-patient dialogue at scale. In a study of 124 urology consultations, a transcript-centered audit identified inter-physician variation and expert disagreement that conventional review missed. We explore the epistemic gains of this approach, its nonverbal blind spots, behavioral effects, technical vulnerabilities, and the EU AI Acts regulatory landscape.

8

Cost-effectiveness of Ultrasound Screening for Uterine Fibroids in the United States

Mhatre, P.; von Rosenvinge, L.; Suresh, A.; Patzkowsky, K.; Frost, A.; Vargas, M. V.; Wu, H.; Wang, K.; Simpson, K.; Segars, J.; Singh, B.

2026-03-11 health economics 10.64898/2026.03.10.26347936 medRxiv

Top 0.1%

3.0%

Show abstract

BackgroundUterine fibroids cause significant morbidity, psychosocial stress, and poor quality of life due to symptoms including heavy menstrual bleeding, anemia, pain, and bulk symptoms, as well as reproductive complications including infertility, early pregnancy loss, and preterm birth. Fibroids represent a 42.2 billion USD annual economic burden to the United States healthcare system. Despite reported delays in diagnosis of fibroids even in symptomatic women, clinical guidelines do not recommend screening for fibroids. High risk patient groups are well known. Earlier detection of fibroids through ultrasound screening could allow for earlier intervention with secondary prevention strategies or less invasive treatment options and improve the quality of life of women living with fibroids. ObjectiveThe study aimed to evaluate the cost-effectiveness of annual ultrasound screening for fibroids in women aged 25-54 years in the United States. Study DesignIn this economic evaluation, conducted in January-February 2026, a decision-analytic Markov model was developed using a healthcare payer perspective to analyze the cost-effectiveness of ultrasound screening for women in the United States. The time horizon was 25 to 55 years of age. Costs were adjusted for inflation to 2025 average according to the yearly medical care index of the United States consumer price index. Discounting (3% per cycle) and half-cycle corrections were calculated. Deterministic and probabilistic sensitivity analyses were performed to explore uncertainty, analyzed using TreeAge Pro Healthcare software. Model variables were obtained from published literature. All women residing in the United States aged 25-54 years were assumed to have been invited to the screening program. ResultsUltrasound screening for fibroids for women was found to be not only cost-effective but also cost-saving, with an incremental cost-effectiveness ratio (ICER) of -$56,605.631 per QALY (quality-adjusted life-year) gained in the base-case analysis, at a willingness to pay threshold of $30,000 per QALY. Ultrasound screening was cost-effective at all starting ages from 25 to 54 years, with even greater benefit at younger ages. Sensitivity analyses demonstrated the robustness of these findings across a wide range of variable ranges. Ultrasound screening for fibroids showed a cumulative potential to save $1,169 billion and increase 20.7 million QALYs per year compared to no screening for a population of 63.89 million American women between 25 and 54 years old. The subset of 9.32 million Black American women experienced greater benefits, with potential savings of 183 billion and an increase of 3 million QALYs. ConclusionBased on the model-based analysis, annual ultrasound screening for uterine fibroids for women aged 25-54 years in the United States was cost-effective and cost-saving, even more so for Black women. These model-based findings highlighted the potential value of guidelines for annual ultrasound screening for fibroids, which could enable earlier diagnosis, secondary prevention, and timely intervention, with positive impact on both quality of life and healthcare costs. Tweetable StatementAnnual ultrasound screening for uterine fibroids in U.S. for women aged 25-54 years was cost-effective and cost-saving. Study at a GlanceO_ST_ABSA. Why was this study conducted?C_ST_ABSO_LITo evaluate whether annual ultrasound screening for fibroids in women aged 25-54 years in the U.S. is cost-effective. C_LI B. What are the key findings?O_LIAnnual ultrasound screening beginning at 25 years was both cost-effective and cost-saving, with an ICER of -$56,605.631/QALY for women in the US. C_LIO_LIScreening resulted in potential savings of $1,169 billion for US healthcare payers and 20.7 million QALYs for U.S. women. C_LI C. What does this study add to what is already known?O_LIAnnual ultrasound screening for fibroids is not only cost effective but also cost saving, highlighting its potential to reduce diagnostic delays and enable earlier, less invasive interventions. C_LIO_LIThe results support development and implementation of fibroid screening guidelines. C_LI

9

An E-value-Informed Sensitivity Analysis Framework for Hybrid Controlled Trials

Liu, C.; Mayer, M.; Lactaoen, K.; Gomez, L.; Weissman, G.; Hubbard, R.

2026-03-06 epidemiology 10.64898/2026.03.05.26347653 medRxiv

Top 0.1%

2.9%

Show abstract

Hybrid controlled trials (HCTs) incorporate real-world data into randomized controlled trials (RCTs) by augmenting the internal control arm with patients receiving the same treatment in routine care. Beyond increasing power, HCTs may improve recruitment by supporting unequal randomization ratios that increase patient access to experimental treatments. However, HCT validity is threatened by bias from unmeasured confounding due to lack of randomization of external controls, leading to outcome non-exchangeability between internal and external control patients. To address this challenge, we developed a sensitivity analysis framework to assess the robustness of HCT results to potential unmeasured confounding. We propose a tipping point analysis that adapts the E-value framework to the HCT setting where trial participation rather than treatment assignment is subject to confounding. To aid interpretation, we also introduce a data-driven benchmark representing the strength of unmeasured confounding reflected by the observed outcome non-exchangeability. We then propose an operational decision rule and evaluate its performance through simulation studies. Finally, we illustrate the approach using an asthma trial augmented by data from electronic health records. Simulation results demonstrate that our decision rule safeguards against Type I error inflation while preserving the power gains achieved by incorporating external data. In settings where moderate unmeasured confounding led to poorer outcomes for external controls, Type I error was controlled near the nominal 5% level, and power increased by 10-20% compared with analyses using RCT data alone. Our approach provides a practical, interpretable method to assess HCT robustness, supporting rigorous inference when integrating external real-world data.

10

HAARF: Healthcare AI Agents Regulatory Framework - A Comprehensive Security Verification Standard for Autonomous AI Systems in Clinical Environments

Schwoebel, J.; Frasch, M.; Spalding, A.; Sewell, E.; Englert, P.; Halpert, B.; Overbay, C.; Semenec, I.; Shor, J.

2026-04-13 health systems and quality improvement 10.64898/2026.04.09.26350519 medRxiv

Top 0.1%

2.6%

Show abstract

As health systems begin deploying autonomous AI agents that make independent clinical decisions and take direct actions within care workflows, ensuring patient safety and care quality requires governance standards that go beyond existing medical device frameworks designed for human-in-the-loop prediction tools. This paper introduces the Healthcare AI Agents Regulatory Framework (HAARF), a comprehensive verification standard for autonomous AI systems in clinical environments, developed collaboratively with 40+ international experts spanning regulatory authorities, clinical organizations, and AI security specialists. HAARF synthesizes requirements from nine major regulatory frameworks (FDA, EU AI Act, Health Canada, UK MHRA, NIST AI RMF, WHO GI-AI4H, ISO/IEC 42001, OWASP AISVS, IMDRF GMLP) into eight core verification categories comprising 279 specific requirements across three risk-based implementation levels. The framework addresses critical gaps in health system readiness for autonomous AI including: (1) progressive autonomy governance with clinical accountability, (2) tool-use security for agents that independently access EHRs, medical devices, and clinical systems, (3) continuous equity monitoring and bias mitigation across diverse patient populations, and (4) clinical decision traceability preserving human oversight authority. We validate HAARFs enforcement capabilities through a scenario-based red-team evaluation comprising six adversarial scenarios executed under baseline (no middleware) and HAARF- guardrailed conditions (N = 50 trials each, Gemini 2.5 Flash primary with Claude Sonnet 4.6 cross-model validation). In baseline conditions, the agent model executes unauthorized tools in 56-60% of adversarial trials. Under the HAARF condition, deterministic middleware enforcement reduces the unauthorized-tool success rate to 0%, with 0% contraindication misses and 0% policy-injection success (95% Wilson CI [0.00, 0.07]). Cross-model validation confirms identical security metrics, supporting HAARFs model-agnostic design. Mapping analysis demonstrates 48-88% coverage of major regulatory frameworks, with per-category FDA alignment ranging from 73% (C5, Agent Registration) to 91% (C3, Cybersecurity; C7, Bias & Equity). Initial validation with healthcare organizations shows a 40-60% reduction in multi-jurisdictional compliance burden and improved clinical safety governance outcomes. HAARF provides health systems with a practical, risk-stratified pathway for safe AI agent deployment--shifting from reactive compliance to proactive quality governance while maintaining rigorous patient safety standards and human-centered care principles.

11

Building a Resilient Antibiotic Market: India Sets the Pace An econometric modelling approach to estimate revenues in Indian private markets for a novel, broad-spectrum intravenous antibiotic

Maitreyi, L.; Rajagopal, S.; Anandkumar, A.; Datta, S.

2026-03-22 health economics 10.64898/2026.03.13.26348309 medRxiv

Top 0.1%

2.5%

Show abstract

India faces a mounting health crisis from antibiotic resistance, coupled with global pharmaceutical hesitancy to invest in novel antibiotic research and development (R&D), driven by complex scientific and financial hurdles. India carries one of the worlds largest absolute burdens of drug-resistant infections. The combination of a huge infectious-disease caseload, rapid urbanisation, and gaps in sanitation and primary care means that, when resistance emerges, it affects far more patients and generates a much larger pool of patients needing advanced antibiotics than in many high-income countries. Against this backdrop, demand for truly novel, broad-spectrum antibiotics in India is surging, fueled by rising multidrug-resistant infections, overstretched hospitals, and an antibiotic resistance market projected to grow rapidly over the next decade. Most countries respond with incentives and subscription models, for India, the answer lies in bold, innovative revenue strategies and in prioritising the domestic launch of novel antibiotics. This paper presents an econometric analysis of estimated valuation for a novel broad-spectrum antibiotic in India that, as a single therapeutic agent, can address several major hospital-acquired infections, including complicated urinary tract infections (cUTI), hospital-acquired pneumonia (HAP), and ventilator-associated pneumonia (VAP). The model focuses on a hypothetical "ideal" broad-spectrum intravenous antibiotic, and recommends that India pioneer market entry, highlighting financial models that maximise early revenues while still hardwiring stewardship. Launching new antibiotics first in India can catalyse robust real-world use, strengthen domestic pharma, and demonstrate that the economics of antibiotic innovation are viable. This decisive shift can transform India from a passive recipient of ageing drugs into the crucible where the next generation of life-saving antibiotics is forged, anchoring antibiotic research at the core of the countrys health security and economic resilience.

12

Opportunities for cost reduction of current first-line WHO-recommended oral antiretroviral therapy: replacing tenofovir disoproxil fumarate with tenofovir alafenamide

Jamieson, L.; Venter, W. D. F.; Meyer-Rath, G.

2026-03-13 health economics 10.64898/2026.03.12.26348214 medRxiv

Top 0.1%

2.4%

Show abstract

IntroductionDolutegravir-based first-line antiretroviral therapy (tenofovir disoproxil fumarate, lamivudine, and dolutegravir; TLD) has delivered substantial clinical and public health benefits. However, sharply decreasing funding for HIV programmes necessitates cost reduction within current treatment guidelines. We evaluated whether replacing tenofovir disoproxil fumarate with tenofovir alafenamide (TAFLD), a drug with equivalent effectiveness and side effect profile, could reduce HIV treatment costs in South Africa. MethodsWe conducted a budget-impact analysis over 2026-2030 from the provider-perspective. The cost of antiretroviral treatment (ART) provision with either TLD or TAFLD was estimated using ingredients-based costing, including the cost of drugs, laboratory monitoring, staff, consumables, equipment and overheads. Costs are reported in 2025 USD, are undiscounted and not inflated. Population estimates for adults on first-line therapy were derived from Thembisa 4.8. We modelled a phased transition from TLD to TAFLD over two years, and explored sensitivity to TAFLD price variation ({+/-}15%) and inclusion of creatinine monitoring. ResultsTAFLD reduced per-patient annual costs by 4-5% compared with TLD (from US$178 to US$169, and US$287 to US$277, for first and follow-up years, respectively). At full replacement, total programme savings were approximately US$54 million per year (-5%). Even with continued creatinine monitoring, TAFLD remained cost-saving, reducing annual costs by around 4%. Savings increased to 8% if TAFLD prices were 15% lower than base-case assumptions. ConclusionsReplacing TDF with TAF in first-line antiretroviral therapy could generate meaningful cost savings for South Africa with minimal programme disruption. While long-term metabolic effects require consideration, TAFLD represents a feasible interim cost-reduction strategy while awaiting next-generation HIV therapies.

13

Comparing Physicians' Assessments of Context-specific AI-powered clinical reasoning assistant with General-Purpose AI agent: A Prospective Multi-Site Physician Evaluation of VITA versus ChatGPT in India and Bangladesh

Mandke, C.; Agrawal, H. K.; Bharti, B.; Chansoria, M.; Gupta, G.; Rawat, S. K.; Sarkar, N. K.; Singh, A.; PS, S.; Walia, S.; VALID (Validation of AI in Low-resource and Indian Domains) Consortium,

2026-04-30 health systems and quality improvement 10.64898/2026.04.30.26351194 medRxiv

Top 0.1%

2.1%

Show abstract

BackgroundHealthcare providers in low- and middle-income countries (LMICs) are increasingly relying on Artificial Intelligence (AI) tools, yet most available AI assistants are general-purpose systems not designed for the specific clinical, epidemiological, and resource contexts of these settings. There is no evidence, from physicians assessments, on whether clinical reasoning support from purpose-built, context-specific and retrieval-augmented AI tools can outperform general-purpose AI agents. MethodsWe conducted a prospective multi-site validation study enrolling 37 physicians across India and Bangladesh. Each physician evaluated two AI tools (a) VITA (Validated Intelligence for Treatment and Assessment), a purpose-built (context-specific and retrieval-augmented) clinical reasoning AI assistant trained on India-specific guidelines, antimicrobial resistance patterns, and formulary constraints, and (b) ChatGPT Plus (version 5.2), a leading general-purpose AI assistant on six hypothetical clinical case vignettes (three predefined, three physician-selected). Evaluations were scored across six dimensions (differential diagnosis, clinical workup, treatment recommendation, dosing, clinical decision-making, and evidence quality) on a 1-5 Likert scale, yielding 444 observations. Analyses included paired t-tests, Wilcoxon signed-rank tests, and multivariate regressions with robust standard errors. ResultsVITA scored significantly higher than ChatGPT across all six evaluation dimensions. The mean composite score (sum of all dimensions, maximum = 30) was 25.4 for VITA versus 22.3 for ChatGPT (difference = +3.1 points, t = 8.31, p < 0.001). The largest advantage was in evidence quality (VITA: 4.46 vs. ChatGPT: 3.14, a 42% relative gap). VITAs advantage was consistent across both predefined and doctor-defined hypothetical cases and was robust to controls for physician demographics, case type, and evaluation order in multivariate regression (coefficient = +3.08, p < 0.001). ConclusionsIn this first systematic head-to-head physician evaluation of a purpose-built clinical reasoning AI assistant versus general-purpose AI in an LMIC setting, physicians consistently rated the context-specific tool as superior. These findings suggest that contextual relevance--including local guidelines, formulary constraints, and resistance patterns--matters for clinical AI adoption and quality in resource-limited settings.

14

The Golden Opportunity or the Cutting Room Floor? Quantifying and Characterizing the Loss and Addition of Social Determinants of Health during Clinician Editing of Ambient AI Documentation

Kim, S.; Guo, Y.; Sutari, S.; Chow, E.; Tam, S.; Perret, D.; Pandita, D.; Zheng, K.

2026-04-22 health systems and quality improvement 10.64898/2026.04.20.26351322 medRxiv

Top 0.1%

1.8%

Show abstract

Social determinants of health (SDoH) are important for clinical care, but it remains unclear how much AI-captured social context is preserved after clinician editing in ambient documentation workflows. We retrospectively analyzed 75,133 paired ambient AI-drafted and clinician-finalized note sections from ambulatory care at a large academic health system. Using a rule-based NLP pipeline, we extracted 21 SDoH categories and quantified retention, deletion, and addition. SDoH appeared in 25.2% of AI drafts versus 17.2% of final notes. At the mention level, AI captured 29,991 SDoH mentions, of which 45.1% were deleted, 54.9% were retained with clinicians adding 3,583 new mentions. Insurance and marital status were most often deleted, whereas substance use and physical activity were more often retained. Deletion patterns also varied by specialty, supporting the need for specialty-aware ambient AI systems.

15

Budget Impact of Replacing In-Laboratory Polysomnography With Comprehensive Home Polysomnography Using the Onera Sleep Test System in a U.S. Commercial Health Plan

Hinkel, J.; Modi, S.; Ray, A.; Brill, J.

2026-05-18 health economics 10.64898/2026.05.13.26352915 medRxiv

Top 0.1%

1.8%

Show abstract

Background: In-laboratory polysomnography (PSG) remains the diagnostic reference standard for sleep disorders but is resource-intensive and capacity-constrained. Limited-channel home sleep apnea testing (HSAT) improves access and reduces costs compared to in-laboratory polysomnography, but underestimates disease severity due to its inability to measure true sleep time and cannot identify non-respiratory sleep disorders including periodic limb movement disorder and parasomnias.1-5 Comprehensive home polysomnography (hPSG) may preserve diagnostic fidelity while reducing system costs, improving access for patients unable to attend laboratory-based studies, and shortening time to diagnosis and therapy initiation. Objective: To estimate the short-term budget impact to a U.S. commercial health plan of substituting an appropriately selected proportion of in-laboratory PSG with comprehensive hPSG using the Onera Sleep Test System (STS). Methods: We developed a transparent budget impact model following ISPOR good practice guidelines for a hypothetical 1-million-member commercial plan. The model estimates the annual diagnostic population (top-of-funnel) using age- and sex-stratified prevalence, an undiagnosed fraction of 85%, symptom prevalence among undiagnosed individuals (30%), and an annual testing rate (12%).2-3 Baseline costs reflect current diagnostic pathways using HSAT (50% first-line) and in-laboratory PSG (50% first-line), including HSAT-to-PSG escalations (20%) and PSG repeats (4%). The intervention scenario substitutes a defined share of in-laboratory PSG and selected HSAT with Onera hPSG. Scenario and sensitivity analyses explore parameter uncertainty. Results: In the base case, approximately 4,364 individuals entered the OSA diagnostic workflow annually. Baseline diagnostic costs were estimated at $6.23 PMPM, comprising $5.45 million in PSG costs and $0.79 million in HSAT costs. Introducing Onera hPSG (30% PSG replacement, 5% HSAT replacement in Year 1) reduced per member costs to $5.66 PMPM, yielding net savings of $0.57 PMPM ($567,262 annually). In Year 3 scenarios (60% PSG, 10% HSAT replacement), savings increased to $1.64 PMPM (approximately $1.64 million annually). Sensitivity analyses demonstrated net savings ranging from $0.03 to $8.05 PMPM, depending on adoption levels. Conclusions: Partial substitution of in-laboratory PSG with Onera hPSG may yield incremental budget savings for U.S. commercial payers while maintaining access to full polysomnographic assessment. Results support further payer-specific analyses incorporating real-world utilization and downstream outcomes. Keywords: obstructive sleep apnea; polysomnography; home sleep testing; budget impact analysis; health economics

16

Uncertainty Aware Decision Support with Computationally Expensive Simulation Models: A Case Study of HIV Intervention Scenarios

fadikar, a.; Hotton, A.; de Lima, P. N.; Vardavas, R.; Collier, N.; Jia, K.; Rimer, S.; Khanna, A.; Schneider, J.; Ozik, J.

2026-04-17 hiv aids 10.64898/2026.04.15.26350970 medRxiv

Top 0.1%

1.8%

Show abstract

Detailed agent-based simulations are increasingly used to support policy decisions, but their computational cost and complex uncertainty structure make systematic scenario analysis challenging. We present a data-driven, uncertainty-aware decision support (DDUADS) workflow for using stochastic simulation models as decision-support tools under limited computational budgets. The approach combines several established techniques--sensitivity screening, Bayesian calibration using simulation-based inference, and multi-surrogate model integration for translational efficiency--into a coherent pipeline that enables uncertainty-aware policy analysis. Rather than producing a single baseline, the calibration stage yields a posterior distribution over plausible model parameterizations, allowing flexible, uncertainty-aware forward projections. We demonstrate the DDUADS workflow on the INFORM-HIV agent-based model of HIV transmission in Chicago to evaluate potential disruptions in antiretroviral therapy (ART) and pre-exposure prophylaxis (PrEP) use. While the specific application is HIV modeling, the challenges and techniques described here arise in other simulation studies and can be applied to decision support in other domains.

17

Educational Browser-Native SIR Simulation: Analytical Benchmarks Showing Numerical Accuracy for Lightweight Epidemic Modeling

Ben-Joseph, J.

2026-04-17 epidemiology 10.64898/2026.04.15.26350961 medRxiv

Top 0.1%

1.7%

Show abstract

Lightweight epidemic calculators are widely used for teaching and rapid scenario exploration, yet many omit the methodological detail needed for scientific reuse. We present a browser-native SIR calculator that exposes forward Euler and classical fourth-order Runge-Kutta (RK4) integration alongside epidemiologically interpretable outputs and a population-conservation diagnostic. The implementation is anchored to analytical properties of the deterministic SIR system, including the epidemic threshold, the peak condition, and the final-size relation. Bench-mark experiments show that RK4 is essentially step-size invariant over practical discretizations, whereas Euler at a coarse one-day step overestimates peak prevalence by 3.97% and final size by 0.66% relative to a fine-step RK4 reference. These results demonstrate that browser-based tools can support publication-quality computational narratives when solver choice, diagnostics, and assumptions are treated as first-class outputs.

18

Ad-verse Effects: Pharmaceutical Advertising Shifts Drug Recommendations by Consumer-Facing AI

Omar, M.; Agbareia, R.; McGreevy, J.; Zebrowski, A.; Ramaswamy, A.; Gorin, M.; Anato, E. M.; Glicksberg, B. S.; Sakhuja, A.; Charney, A.; Klang, E.; Nadkarni, G.

2026-04-16 health policy 10.64898/2026.04.14.26350868 medRxiv

Top 0.1%

1.7%

Show abstract

Large language models are increasingly used for clinical guidance while their parent companies introduce advertising. We tested whether pharmaceutical ads embedded in the prompts of 12 models from OpenAI, Anthropic, and Google shift drug recommendations across 258,660 API calls and four experiments probing distinct epistemic conditions. When two drugs were both guideline-appropriate, advertising shifted selection of the advertised drug by +12.7 percentage points (P < 0.001), with some model-scenario pairs shifting from 0% to 100%. Google models were the most susceptible (+29.8 pp), followed by OpenAI (+10.9 pp), while Anthropic models showed minimal change (+2.0 pp). When the advertised product lacked evidence or was clinically suboptimal, models resisted. This reveals a structured vulnerability: advertising does not override medical knowledge but fills the space where clinical evidence is underdetermined. An open-response sub-analysis (2,340 calls across three representative models) confirmed that advertising restructures free-text clinical reasoning: models echoed ad claims at 2.7 times the baseline rate while maintaining high stated confidence and rarely disclosing the ad. Susceptibility was provider-dependent (Google: +29.8 pp; OpenAI: +10.9 pp; Anthropic: +2.0 pp). Because this bias operates within clinically correct answers, it is invisible to accuracy-based evaluation, identifying a class of AI safety vulnerability that standard testing cannot detect.

19

Simulating population compliance with pandemic interventions using large language models

Liu, R.; Jong, C.; Li, H.; Cao, Y.; Yao, Q.; Yamana, T.; Pei, S.; Du, H.

2026-05-15 infectious diseases 10.64898/2026.05.12.26352942 medRxiv

Top 0.1%

1.7%

Show abstract

Effective pandemic response requires accurate modeling of population compliance with non-pharmaceutical interventions (NPIs), yet most epidemic models treat behavioral change as fixed scenarios rather than an emergent process. Here, we test whether large language model (LLM)-based agents can generate individualized behavioral responses to time-varying NPIs and disease risk. We instantiate demographically representative agents in three U.S. cities (Boston, Denver, San Antonio) and condition them on evolving outbreak conditions and policies during the early COVID-19 pandemic, without fitting to observed mobility data. Across three frontier LLMs and their ensemble, agents generate zero-shot mobility changes across restaurants, retail, and entertainment venues, benchmarked against cellphone-derived foot-traffic records. The simulations recover average mobility trends across cities and venue types but exhibit overly narrow within-city variation. The three LLMs display distinct biases, while an ensemble approach improves robustness and overall performance. These findings establish LLM agents as a promising framework for modeling adherence to NPIs and highlight the need for further fine-tuning and empirical validation before they can support policy analysis.

20

Modeling the impact of adherence to U.S. isolation and masking guidance on SARS-CoV-2 transmission in office workplaces in 2021-2022

Garcia Quesada, M.; Wallrafen-Sam, K.; Kiti, M. C.; Ahmed, F.; Aguolu, O. G.; Ahmed, N.; Omer, S. B.; Lopman, B. A.; Jenness, S. M.

2026-04-21 epidemiology 10.64898/2026.04.14.26350639 medRxiv

Top 0.1%

1.6%

Show abstract

Non-pharmaceutical interventions (NPIs) have been important for controlling SARS-CoV-2 transmission, particularly before and during initial vaccine rollout. During the pandemic, the US Centers for Disease Control and Prevention issued isolation and masking guidance in case of COVID-19-like illness, a positive SARS-CoV-2 test, or known exposure to SARS-CoV-2. However, the impact of this guidance on mitigating transmission in office workplaces is unclear. We used a network-based mathematical model to estimate the impact of this guidance on SARS-CoV-2 transmission among office workers and their communities. The model represented social contacts in the home, office, and community. We used data from the CorporateMix study to parametrize social contacts among office workers and calibrated the model to represent the COVID-19 epidemic in Georgia, USA from January 2021 through August 2022. In the reference scenario (58% adherence to guidance among office workers and the broader population), workplace transmission accounted for a small fraction of total infections. Reducing adherence among office workers to 0% increased workplace transmissions by 27.1% and increasing adherence to 75% reduced workplace transmission by 7.0%. Increasing adherence to 75% among office workers had minimal impact on symptomatic cases and deaths; increasing it among the broader population was more effective in reducing office worker cases and deaths. In our model, moderate adherence to recommended NPIs in workplaces was effective in reducing transmission, but increasing adherence had limited benefit given workplaces that have low contact intensity and hybrid work arrangements. These results underscore the public health benefits of community-wide adoption of recommended NPIs.